Fast and easy development of pronunciation lexicons for names

نویسندگان

  • Henk van den Heuvel
  • Jean-Pierre Martens
  • Nanneke Konings
چکیده

We show that a good approach for the grapheme-to-phoneme conversion of Dutch proper names (e.g. person names, toponyms, etc), is to use a cascade of a general purpose grapheme-to-phoneme (G2P) converter and a special purpose phoneme-to-phoneme (P2P) converter. The G2P produces an initial transcription that is then transformed by the P2P. The P2P is automatically trained on reference transcriptions of names belonging to the envisaged name category (e.g. toponyms). The P2P learning process is conceived in such a way that it can take account of high order determinants of pronunciation, such as specific syllables, name prefixes and name suffixes. The proposed methodology was successfully tested on person names and toponyms, but we believe that it will also offer substantial reductions of the cost for building pronunciation lexicons of other name categories.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CrossTowns: Automatically Generated Phonetic Lexicons of Cross-lingual Pronunciation Variants of European City Names

The CrossTowns lexicons are part of a study that focuses on the phonetic variants that occur when speakers of different native languages (L1) with varying degrees of target language (L2) proficiency pronounce foreign city names. Based on a collection of speech data from this domain, it is one of the aims to identify the most common pronunciation errors in a particular L1/L2 pair (language direc...

متن کامل

Recognition of foreign names spoken by native speakers

It is a challenge to develop a speech recognizer that can handle the kind of lexicons encountered in an automatic attendant or car navigation application. Such lexicons can contain several 100K entries, mainly proper names. Many of these names are of a foreign origin, and native speakers can pronounce them in different ways, ranging from a completely nativized to a completely foreignized pronun...

متن کامل

G2p conversion of names: what can we do (better)?

In this contribution it is shown that a good approach for the grapheme-to-phoneme conversion of proper names (e.g. person names, toponyms, etc), is to use a cascade of a general purpose grapheme-to-phoneme (G2P) converter and a special purpose phoneme-to-phoneme (P2P) converter. The G2P produces an initial transcription that is then transformed by the P2P. The latter is automatically trained on...

متن کامل

On designing pronunciation lexicons for large vocabulary, continuous speech recognition

Creation of pronunciation lexicons for speech recognition is widely acknowledged to be an important, but labor-intensive, aspect of system development. Lexicons are often manually created and make use of knowledge and expertise that is difficult to codify. In this paper we describe our American English lexicon developed primarily for the ARPA WSJ/NAB tasks. The lexicon is phonemically represent...

متن کامل

Extending Pronunciation Lexicons via Non-phonemic Respellings

This paper describes work in progress towards using non-phonemic respellings as an additional source of information besides spelling in the process of extending pronunciation lexicons for speech recognition and text-tospeech systems. Preliminary experimental data indicates that the approach is likely to be successful. The major benefit of the approach is that it makes extending pronunciation le...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007